A Batch Mode Active Learning for Networked Data

نویسندگان

  • Lixin Shi
  • Yuhang Zhao
  • Jie Tang
چکیده

We study a novel problem of batch mode active learning for networked data. In this problem, data instances are connected with links and their labels are correlated with each other, and the goal of batch mode active learning is to exploit the link-based dependencies and node-specific content information to actively select a batch of instances to query the user for learning an accurate model to label unknown instances in the network. We present three criteria (i.e., minimum redundancy, maximum uncertainty and maximum impact) to quantify the informativeness of a set of instances, and formalize the batch mode active learning problem as selecting a set of instances by maximizing an objective function which combines both link and content information. As solving the objective function is NP-hard, we present an efficient algorithm to optimize the objective function with a bounded approximation rate. To scale to real large networks, we develop a parallel implementation of the algorithm. Experimental results on both synthetic data sets and real-world data sets demonstrate the effectiveness and efficiency of our approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Optimization Based Framework for Dynamic Batch Mode Active Learning

Active learning techniques have gained popularity in reducing human effort to annotate data instances for inducing a classifier. When faced with large quantities of unlabeled data, such algorithms automatically select the salient and representative samples for manual annotation. Batch mode active learning schemes have been recently proposed to select a batch of data instances simultaneously, ra...

متن کامل

Designinga Neuro-Sliding Mode Controller for Networked Control Systems with Packet Dropout

This paper addresses control design in networked control system by considering stochastic packet dropouts in the forward path of the control loop. The packet dropouts are modelled by mutually independent stochastic variables satisfying Bernoulli binary distribution. A sliding mode controller is utilized to overcome the adverse influences of stochastic packet dropouts in networked control system...

متن کامل

Discriminative Batch Mode Active Learning

Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at one time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative un...

متن کامل

Convex Batch Mode Active Sampling via alpha-relative Pearson Divergence

Active learning is a machine learning technique that trains a classifier after selecting a subset from an unlabeled dataset for labeling and using the selected data for training. Recently, batch mode active learning, which selects a batch of samples to label in parallel, has attracted a lot of attention. Its challenge lies in the choice of criteria used for guiding the search of the optimal bat...

متن کامل

Near-optimal Batch Mode Active Learning and Adaptive Submodular Optimization

Active learning can lead to a dramatic reduction in labeling effort. However, in many practical implementations (such as crowdsourcing, surveys, high-throughput experimental design), it is preferable to query labels for batches of examples to be labelled in parallel. While several heuristics have been proposed for batch-mode active learning, little is known about their theoretical performance. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011